A Performance Measure for Classification with Ambiguous Data
نویسندگان
چکیده
Real world data can be difficult to classify due to overlapping classes of ambiguous data. One solution to this problem is to leave out data before classifying, while another solution is to first classify the data and then prune those results which are ambiguous. However, a problem exists in determining which data are ambiguous. In this paper we propose a performance criteria which gives a precise basis for characterizing the performance of any classifier applied to ambiguous data. Further, we demonstrate that there is an optimal region of withholding classifications which depends on the performance criteria. We test our method on some benchmark classification problems to show the effectiveness of the approach.
منابع مشابه
Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection
Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...
متن کاملProposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کاملخوشهبندی دادههای بیانژنی توسط عدم تشابه جنگل تصادفی
Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...
متن کاملUsing SVM for Classification in Datasets with Ambiguous Data
One of the challenges in machine learning is the classification of datasets with ambiguous instances. In this paper we study specifically datasets with examples that have overlapping feature values for different classes. In these circumstances there is a bound on the classification performance. While there seems to be a race for accuracy, very little has been done to understand and solve the is...
متن کاملImproving Chernoff criterion for classification by using the filled function
Linear discriminant analysis is a well-known matrix-based dimensionality reduction method. It is a supervised feature extraction method used in two-class classification problems. However, it is incapable of dealing with data in which classes have unequal covariance matrices. Taking this issue, the Chernoff distance is an appropriate criterion to measure distances between distributions. In the p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999